Skip to content

[SPARK-22602][SQL] remove ColumnVector#loadBytes#19815

Closed
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:load-bytes
Closed

[SPARK-22602][SQL] remove ColumnVector#loadBytes#19815
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:load-bytes

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

ColumnVector#loadBytes is only used as an optimization for reading UTF8String in WritableColumnVector, this PR moves this optimization to WritableColumnVector and simplified it.

How was this patch tested?

existing test

@cloud-fan
Copy link
Copy Markdown
Contributor Author

@SparkQA
Copy link
Copy Markdown

SparkQA commented Nov 24, 2017

Test build #84170 has finished for PR 19815 at commit ae7db88.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Nov 24, 2017

Test build #84172 has finished for PR 19815 at commit 3a59b32.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Copy Markdown
Member

kiszk commented Nov 25, 2017

I will look this Sunday.

Copy link
Copy Markdown
Member

@viirya viirya Nov 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add comment that getUTF8String reuse the data in column vector? It seems different than other getXXX APIs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, but looks decodeToBinary will copy byte data?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems orthogonal to this issue. It would be nice if we could avoid the copy though. That would require some work on the dictionary code path.

@kiszk
Copy link
Copy Markdown
Member

kiszk commented Nov 26, 2017

LGTM except one comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks risky if we do not make a copy.

If we plan to avoid the unnecessary data copy by this API, we should rename the API name getUTF8String and check all the callers whether they do not break the assumption.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a non-issue: the current on-heap code path already avoids making copies.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a bit better to use arrayData() instead of childColumns[0], they are practically the same, but it makes the intent a bit clearer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it to

  //
  // APIs dealing with Bytes
  //

Any better names?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, use arrayData().

@gatorsmile
Copy link
Copy Markdown
Member

LGTM pending Jenkins.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Nov 27, 2017

Test build #84203 has finished for PR 19815 at commit 5711bb2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Copy Markdown
Member

Thanks! Merged to master.

@asfgit asfgit closed this in 5a02e3a Nov 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants